Goto

Collaborating Authors

 Greater Upper Nile



PerfectDou: DominatingDouDizhuwith PerfectInformationDistillation

Neural Information Processing Systems

As a challenging multi-player card game, DouDizhu has recently drawn much attention for analyzing competition and collaboration in imperfect-information games. In this paper, we propose PerfectDou, a state-of-the-art DouDizhu AI system that dominates the game, in an actor-critic framework with a proposed technique named perfect information distillation.


RethinkingFourierTransformfromABasisFunctions PerspectiveforLong-termTimeSeriesForecasting

Neural Information Processing Systems

We propose a new perspective to reconsider theFourier transform from abasis functions perspective. Specifically, the real and imaginary parts of the frequency components can be viewed as the coefficients of cosine and sine basis functions at tiered frequency levels, respectively.


ChemBOMAS: Accelerated BO in Chemistry with LLM-Enhanced Multi-Agent System

Han, Dong, Ai, Zhehong, Cai, Pengxiang, Lu, Shanya, Chen, Jianpeng, Ye, Zihao, Sun, Shuzhou, Gao, Ben, Ge, Lingli, Wang, Weida, Zhou, Xiangxin, Liu, Xihui, Su, Mao, Ouyang, Wanli, Bai, Lei, Zhou, Dongzhan, Xu, Tao, Li, Yuqiang, Zhang, Shufei

arXiv.org Artificial Intelligence

Bayesian optimization (BO) is a powerful tool for scientific discovery in chemistry, yet its efficiency is often hampered by the sparse experimental data and vast search space. Here, we introduce ChemBOMAS: a large language model (LLM)-enhanced multi-agent system that accelerates BO through synergistic data- and knowledge-driven strategies. Firstly, the data-driven strategy involves an 8B-scale LLM regressor fine-tuned on a mere 1% labeled samples for pseudo-data generation, robustly initializing the optimization process. Secondly, the knowledge-driven strategy employs a hybrid Retrieval-Augmented Generation approach to guide LLM in dividing the search space while mitigating LLM hallucinations. An Upper Confidence Bound algorithm then identifies high-potential subspaces within this established partition. Across the LLM-refined subspaces and supported by LLM-generated data, BO achieves the improvement of effectiveness and efficiency. Comprehensive evaluations across multiple scientific benchmarks demonstrate that ChemBOMAS set a new state-of-the-art, accelerating optimization efficiency by up to 5-fold compared to baseline methods.



Podcasts as a Medium for Participation in Collective Action: A Case Study of Black Lives Matter

Moldovan, Theodora, Pera, Arianna, Vega, Davide, Aiello, Luca Maria

arXiv.org Artificial Intelligence

We study how participation in collective action is articulated in podcast discussions, using the Black Lives Matter (BLM) movement as a case study. While research on collective action discourse has primarily focused on text-based content, this study takes a first step toward analyzing audio formats by using podcast transcripts. Using the Structured Podcast Research Corpus (SPoRC), we investigated spoken language expressions of participation in collective action, categorized as problem-solution, call-to-action, intention, and execution. We identified podcast episodes discussing racial justice after important BLM-related events in May and June of 2020, and extracted participatory statements using a layered framework adapted from prior work on social media. We examined the emotional dimensions of these statements, detecting eight key emotions and their association with varying stages of activism. We found that emotional profiles vary by stage, with different positive emotions standing out during calls-to-action, intention, and execution. We detected negative associations between collective action and negative emotions, contrary to theoretical expectations. Our work contributes to a better understanding of how activism is expressed in spoken digital discourse and how emotional framing may depend on the format of the discussion.


Retrieval-Augmented Clinical Benchmarking for Contextual Model Testing in Kenyan Primary Care: A Methodology Paper

Mutisya, Fred, Gitau, Shikoh, Syovata, Christine, Oigara, Diana, Matende, Ibrahim, Aden, Muna, Ali, Munira, Nyotu, Ryan, Marion, Diana, Nyangena, Job, Ongoma, Nasubo, Mbae, Keith, Wamicha, Elizabeth, Mibuari, Eric, Nsengemana, Jean Philbert, Chidede, Talkmore

arXiv.org Artificial Intelligence

Large Language Models (LLMs) hold promise for improving healthcare access in low-resource settings, but their effectiveness in African primary care contexts remains under-explored. We present a rigorous methodology for creating a benchmark dataset and evaluation framework focused on Kenyan Level 2-3 (dispensary and health center) clinical care. Our approach leverages retrieval-augmented generation (RAG) to ground questions and answers in Kenya's national clinical guidelines, ensuring content aligns with local standard-of-care. The guidelines were digitised, chunked, and indexed for efficient semantic retrieval. Gemini Flash 2.0 Lite was then prompted with relevant guideline excerpts to generate realistic clinical questions, multiple - choice answers, and reasoning scenarios with source citations in English and Swahili. We engaged Kenyan physicians in a co - creation process to refine the dataset's relevance and fairness, and instituted a blinded expert validation pipeline to review for clinical accuracy, clarity, and cultural appropriateness. The resulting Alama Health QA dataset comprises thousands of regulator-aligned question-answer pairs spanning common outpatient conditions in English and Swahili. Beyond standard accuracy metrics, we propose innovative evaluation measures targeting clinical reasoning, safety, and adaptability (e.g. Initial results highlight significant performance gaps in state - of-the - art LLMs when confronted with localized scenarios, echoing recent findings that LLM accuracy on African medical questions lags behind performance on U.S. benchmarks. Our work demonstrates a pathway for dynamic, locally-grounded benchmarks that can evolve with guidelines, providing a crucial tool for safe and effective deployment of AI in African healthcare. Advances in large language models have spurred interest in their potential to augment medical services, especially in low-and middle -income countries facing clinician shortages(Bekbolatova et al., 2024). By handling routine queries or providing decision support, LLMs might help bridge gaps in healthcare access across Africa.


Enhancements for Developing a Comprehensive AI Fairness Assessment Standard

Agarwal, Avinash, Kumar, Mayashankar, Nene, Manisha J.

arXiv.org Artificial Intelligence

Abstract--As AI systems increasingly influence critical sectors like telecommunications, finance, healthcare, and pub lic services, ensuring fairness in decision-making is essenti al to prevent biased or unjust outcomes that disproportionately affect vulnerable entities or result in adverse impacts. This need is particularly pressing as the industry approaches the 6G era, where AI will drive complex functions like autonomous netwo rk management and hyper-personalized services. However, as AI applications diversify, this standard requires enhanceme nt to strengthen its impact and broaden its applicability. This p aper proposes an expansion of the TEC Standard to include fairnes s assessments for images, unstructured text, and generative AI, including large language models, ensuring a more comprehen - sive approach that keeps pace with evolving AI technologies . By incorporating these dimensions, the enhanced framework will promote responsible and trustworthy AI deployment acr oss various sectors. The widespread adoption of Artificial Intelligence (AI) and Machine Learning (ML) technologies has driven transforma-tive advancements across critical sectors, including tele communications, healthcare, finance, and public services.


Navigating Semantic Relations: Challenges for Language Models in Abstract Common-Sense Reasoning

Gawin, Cole, Sun, Yidan, Kejriwal, Mayank

arXiv.org Artificial Intelligence

Large language models (LLMs) have achieved remarkable performance in generating human-like text and solving reasoning tasks of moderate complexity, such as question-answering and mathematical problem-solving. However, their capabilities in tasks requiring deeper cognitive skills, such as common-sense understanding and abstract reasoning, remain under-explored. In this paper, we systematically evaluate abstract common-sense reasoning in LLMs using the ConceptNet knowledge graph. We propose two prompting approaches: instruct prompting, where models predict plausible semantic relationships based on provided definitions, and few-shot prompting, where models identify relations using examples as guidance. Our experiments with the gpt-4o-mini model show that in instruct prompting, consistent performance is obtained when ranking multiple relations but with substantial decline when the model is restricted to predicting only one relation. In few-shot prompting, the model's accuracy improves significantly when selecting from five relations rather than the full set, although with notable bias toward certain relations. These results suggest significant gaps still, even in commercially used LLMs' abstract common-sense reasoning abilities, compared to human-level understanding. However, the findings also highlight the promise of careful prompt engineering, based on selective retrieval, for obtaining better performance.


Can you pass that tool?: Implications of Indirect Speech in Physical Human-Robot Collaboration

Zhang, Yan, Ratnayake, Tharaka Sachintha, Sew, Cherie, Knibbe, Jarrod, Goncalves, Jorge, Johal, Wafa

arXiv.org Artificial Intelligence

Indirect speech acts (ISAs) are a natural pragmatic feature of human communication, allowing requests to be conveyed implicitly while maintaining subtlety and flexibility. Although advancements in speech recognition have enabled natural language interactions with robots through direct, explicit commands--providing clarity in communication--the rise of large language models presents the potential for robots to interpret ISAs. However, empirical evidence on the effects of ISAs on human-robot collaboration (HRC) remains limited. To address this, we conducted a Wizard-of-Oz study (N=36), engaging a participant and a robot in collaborative physical tasks. Our findings indicate that robots capable of understanding ISAs significantly improve human's perceived robot anthropomorphism, team performance, and trust. However, the effectiveness of ISAs is task- and context-dependent, thus requiring careful use. These results highlight the importance of appropriately integrating direct and indirect requests in HRC to enhance collaborative experiences and task performance.